Goto

Collaborating Authors

 mask r-cnn model


Image Segmentation and Classification of E-waste for Training Robots for Waste Segregation

Tripathi, Prakriti

arXiv.org Artificial Intelligence

Abstract--Industry partners provided a problem statement that involves classifying electronic waste using machine learning models, which will be utilized by pick-and-place robots for waste segregation. This was achieved by taking common electronic waste items, such as a mouse and a charger, unsol-dering them, and taking pictures to create a custom dataset. The state-of-the-art YOLOv11 model was trained and run to achieve 70 mAP in real-time. The Mask R-CNN model was also trained and achieved 41 mAP . The model can be integrated with pick-and-place robots to perform segregation of e-waste. Electronic waste (e-waste) is one of the fastest-growing solid waste streams globally [2].


Detecting Cadastral Boundary from Satellite Images Using U-Net model

Anaraki, Neda Rahimpour, Tahmasbi, Maryam, Kheradpisheh, Saeed Reza

arXiv.org Artificial Intelligence

Finding the cadastral boundaries of farmlands is a crucial concern for land administration. Therefore, using deep learning methods to expedite and simplify the extraction of cadastral boundaries from satellite and unmanned aerial vehicle (UAV) images is critical. In this paper, we employ transfer learning to train a U-Net model with a ResNet34 backbone to detect cadastral boundaries through three-class semantic segmentation: "boundary", "field", and "background". We evaluate the performance on two satellite images from farmlands in Iran using "precision", "recall", and "F-score", achieving high values of 88%, 75%, and 81%, respectively, which indicate promising results.


Car Damage Detection and Patch-to-Patch Self-supervised Image Alignment

Chen, Hanxiao

arXiv.org Artificial Intelligence

Most computer vision applications aim to identify pixels in a scene and use them for diverse purposes. One intriguing application is car damage detection for insurance carriers which tends to detect all car damages by comparing both pre-trip and post-trip images, even requiring two components: (i) car damage detection; (ii) image alignment. Firstly, we implemented a Mask R-CNN model to detect car damages on custom images. Whereas for the image alignment section, we especially propose a novel self-supervised Patch-to-Patch SimCLR inspired alignment approach to find perspective transformations between custom pre/post car rental images except for traditional computer vision methods.


Visual based Tomato Size Measurement System for an Indoor Farming Environment

Kweon, Andy, Hu, Vishnu, Lim, Jong Yoon, Gee, Trevor, Liu, Edmond, Williams, Henry, MacDonald, Bruce A., Nejati, Mahla, Sa, Inkyu, Ahn, Ho Seok

arXiv.org Artificial Intelligence

As technology progresses, smart automated systems will serve an increasingly important role in the agricultural industry. Current existing vision systems for yield estimation face difficulties in occlusion and scalability as they utilize a camera system that is large and expensive, which are unsuitable for orchard environments. To overcome these problems, this paper presents a size measurement method combining a machine learning model and depth images captured from three low cost RGBD cameras to detect and measure the height and width of tomatoes. The performance of the presented system is evaluated on a lab environment with real tomato fruits and fake leaves to simulate occlusion in the real farm environment. To improve accuracy by addressing fruit occlusion, our three-camera system was able to achieve a height measurement accuracy of 0.9114 and a width accuracy of 0.9443.


DoPose-6D dataset for object segmentation and 6D pose estimation

Gouda, Anas, Ghanem, Abraham, Reining, Christopher

arXiv.org Artificial Intelligence

Scene understanding is essential in determining how intelligent robotic grasping and manipulation could get. It is a problem that can be approached using different techniques: seen object segmentation, unseen object segmentation, or 6D pose estimation. These techniques can even be extended to multi-view. Most of the work on these problems depends on synthetic datasets due to the lack of real datasets that are big enough for training and merely use the available real datasets for evaluation. This encourages us to introduce a new dataset (called DoPose-6D). The dataset contains annotations for 6D Pose estimation, object segmentation, and multi-view annotations, which serve all the pre-mentioned techniques. The dataset contains two types of scenes bin picking and tabletop, with the primary motive for this dataset collection being bin picking. We illustrate the effect of this dataset in the context of unseen object segmentation and provide some insights on mixing synthetic and real data for the training. We train a Mask R-CNN model that is practical to be used in industry and robotic grasping applications. Finally, we show how our dataset boosted the performance of a Mask R-CNN model. Our DoPose-6D dataset, trained network models, pipeline code, and ROS driver are available online.


Evaluating Novel Mask-RCNN Architectures for Ear Mask Segmentation

Aryal, Saurav K., Barrett, Teanna, Washington, Gloria

arXiv.org Artificial Intelligence

The human ear is generally universal, collectible, distinct, and permanent. Ear-based biometric recognition is a niche and recent approach that is being explored. For any ear-based biometric algorithm to perform well, ear detection and segmentation need to be accurately performed. While significant work has been done in existing literature for bounding boxes, a lack of approaches output a segmentation mask for ears. This paper trains and compares three newer models to the state-of-the-art MaskRCNN (ResNet 101 +FPN) model across four different datasets. The Average Precision (AP) scores reported show that the newer models outperform the state-of-the-art but no one model performs the best over multiple datasets.


How to Use Mask R-CNN in Keras for Object Detection in Photographs

#artificialintelligence

The Region-based CNN (R-CNN) approach to bounding-box object detection is to attend to a manageable number of candidate object regions and evaluate convolutional networks independently on each RoI. R-CNN was extended to allow attending to RoIs on feature maps using RoIPool, leading to fast speed and better accuracy. Faster R-CNN advanced this stream by learning the attention mechanism with a Region Proposal Network (RPN). Faster R-CNN is flexible and robust to many follow-up improvements, and is the current leading framework in several benchmarks. The family of methods may be among the most effective for object detection, achieving then state-of-the-art results on computer vision benchmark datasets. Although accurate, the models can be slow when making a prediction as compared to alternate models such as YOLO that may be less accurate but are designed for real-time prediction. Mask R-CNN is a sophisticated model to implement, especially as compared to a simple or even state-of-the-art deep convolutional neural network model. Source code is available for each version of the R-CNN model, provided in separate GitHub repositories with prototype models based on the Caffe deep learning framework.


Object Detection Using Mask R-CNN with TensorFlow

#artificialintelligence

Mask R-CNN is an object detection model based on deep convolutional neural networks (CNN) developed by a group of Facebook AI researchers in 2017. The model can return both the bounding box and a mask for each detected object in an image. The model was originally developed in Python using the Caffe2 deep learning library. The original source code is available on GitHub. To support the Mask R-CNN model in more libraries that are currently more popular, such as TensorFlow, there is a popular popular open-source project called that offers an implementation based on Keras and TensorFlow 1.3. Google officially released TensorFlow 2.0 in September 2020. TensorFlow 2.0 is better organized and much easier to learn compared to TensorFlow 1.0.


Snagging Parking Spaces with Mask R-CNN and Python

#artificialintelligence

I live in a great city. But like in most cities, finding a parking space here is always frustrating. Spots get snapped up quickly and even if you have a dedicated parking space for yourself, it's hard for friends to drop by since they can't find a place to park. This might sound pretty complicated, but building a working version of this with deep learning is actually pretty quick and easy. All the tools are available -- it is just a matter of knowing where to find the tools and how to put them together.


Step-by-Step Implementation of Mask R-CNN for Image Segmentation

#artificialintelligence

We can see the multiple specifications of the Mask R-CNN model that we will be using. So, the backbone is resnet101 as we have discussed earlier as well. The mask shape that will be returned by the model is 28X28, as it is trained on the COCO dataset. And we have a total of 81 classes (including the background).